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Abstract: The proportional hazards assumption in the commonly used Cox model 
for censored failure time data is often violated in scientific studies. Yang and 

> 

00 ■ Prentice (2005) proposed a novel semiparametric two-sample model that includes 

G\ . 

O ■ the proportional hazards model and the proportional odds model as sub-models 

^6 . 

O ■ and accommodates crossing survival curves. The model leaves the baseline haz 

<N 

ard unspecified and the two model parameters can be interpreted as the short- 
^ . term and long-term hazard ratios. Inference procedures were developed based 

d ■ on a pseudo score approach. Although extension to accommodate covariates was 

mentioned, no formal procedures have been provided or proved. Furthermore, 
the pseudo score approach may not be asymptotically efficient. We study the 
extension of the short-term and long-term hazard ratio model of Yang and Pren- 
tice (2005) to accommodate potentially time-dependent covariates. We develop 
efficient likelihood-based estimation and inference procedures. The nonparamet- 
ric maximum likelihood estimators are shown to be consistent, asymptotically 
normal, and asymptotically efficient. Extensive simulation studies demonstrate 
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that the proposed methods perform well in practical settings. The proposed 
method captured the phenomenon of crossing hazards in a cancer clinical trial 
and identified a genetic marker with significant long-term effect missed by using 
the proportional hazards model on age-at-onset of alcoholism in a genetic study. 

KEY WORDS: Semiparametric hazard rate model; Non-parametric likelihood; 
Proportional hazards model; Proportional odds model; Semiparametric efficiency 



1 Introduction 



M uch of the m odern statistical methodology for survival analysis involves the seminar work 



of 



Coxl (1 1972 ). The Cox proportional hazards model specifies that the hazard function of 



the event time T given a p x 1 covariate vector X takes the form 

A(t|X) = A(t)e^ x , (1) 

where X(t) is an unspecified baseline hazard function and (3 is a p x 1 vector of unknown 
regression parameters. The assumption of constant relative risks over time in the Cox model, 
however, is often violated in many biomedical and genetic studies. For instance, crossing 
hazards may be observed in clinical trials, in which the treatment has certain adverse effect 
initially but can be beneficial in the long run. In genetic studies, a certain gene may have a 
large impact on the hazard for children shortly after birth, but may have a relatively small 
impact later in life. In some other studies, genes related to susceptibility for a certain disease 
may affect older people more than younger people. 

A motivating example is from the Collaborative Study on the Genetics of Alcoholism 
(COGA), a genetic family study with the aim of identifying and characterizin g genetic fac - 



tors that affect the susceptibility to alcohol dependence and related phenotypes (IHasin 



2003|). 



The investigators were particularly interested in assessing genetic effects on the age at onset 



of ALDX1 the DSM-I I 



studies by 



Wang et al. 



■R+ 



eigh ner classification stat us for alcohol dependence. Recent 



d2006|) and 



Diao and Lid ((20101) suggested that SNP rs!972373 on 



chromosome 14 might be a disease susceptibility locus. There are three possible genotypes, 
'1/1', '1/2', and '2/2', at SNP rsl972373. Kaplan-Meier estimates of survival curves for the 
three genotype groups presented in Figure 1 appear to be overlapping with each other before 
age of around 25, after that the curve for '1/1' begins to show more separation from the ones 
for the other two. In such situations, the proportional hazards model cannot distinguish 
short-term and long-term genetic effects. Another interesting example involves data from a 
randomized clinical trial on the treatment of locally unresectable gastric cancer f Gastroin- 



testinal Tumor Study Group 



19821 ) . The aim of this trial was to compare chemotherapy with 



the combined chemo therapy and radiotherapy. As shown in 



Yang and Prentice! (120051 ) and 



Zeng and Linl (120071 ). the Kaplan-Meier survival curves for the two treatment groups cross 
at around 1000 days indicating crossing hazards. The proportional hazards model cannot 
capture crossing hazards and could yield very misleading results in such situations. 

When the assumption of proportion al hazards is questionable, an altern ative to the Cox 



model is the proportional odds model (IBennett 



1983; 



Murphy et al. 



19971 ). which assumes 



that the relative risk converges to one rather than remaining constant as time increases. The 
survival function of T given covariates X under the proportional odds model takes the form 



S(t\X) 



-/3 T X 



(2) 



G(t) + e-^ x 

where G(-) is a strictly increasing function with G(0) = 0. Both the proportional hazards 
and proportional odds models belong to the class of linear transformation models which 
rel ate an unknown monotone t r ansf ormat ion of th e failure time T linearly to the covariates 



X (IBickel et al. 



1993. Ch. 3 



Zeng and Lin 



20071 ). The phenomenon of crossing hazards, 



3 



however, cannot be directly 



captured by linear transformation models. 



Yang and Prentice (120051 ) proposed a novel semiparametric two-sample hazard rate model 
that accommodates crossing survival curves. Their model leaves the baseline distribution un- 
specified and the two model parameters have the appealing interpretations of the short-term 
and the long-term hazard ratios, respectively. The authors developed inference procedures 
based on a pseudo score approach and showed that the estimators are consistent and asymp- 
totically normal. Although extension to accommodate covariates was mentioned, no formal 
procedures have been provided or proved. In addition, the pseudo score approach may not 
be asymptotically efficient. 

In this paper, we study the extension of the two-sample semiparametric hazard rate model 



of lYang and Prentice! (j2005l ) to accommodate covariates. Furthermore, the , covariates 
can be potentially time-dependent. We develop efficient likelihood-based estimation and 
inference procedures. The estimators are shown to be consistent, asymptotically normal, 
and asymptotically efficient. 

The rest of the paper is organized as follows. In section 2, we introduce the semiparamet- 
ric hazard rate model accommodating potentially time-dependent covariates and formulate 
the nonparametric likelihood function. In Section 3, we describe the model assumptions and 
derive the asymptotic results. Extensive simulations studies are presented in Section 4 to 
examine the finite sample properties of the proposed method. In Section 5, we illustrate 
the new model through the applications to the gastric cancer trial and the COGA study 
mentioned before. We conclude with a brief discussion in Section 6. Proofs of the theoretical 
results are provided in the Appendix. 



1 



2 Models and Inference 



Yang and Prentice (120051 ) 



Suppose that there is a random sample of n independent subjects. For the ith subject, let 
Tj be the failure time, C, be the censoring time, and Xj be a p x 1 vector of (time invariant) 
covariates. The data consist of {lj = min(Tj, Cj), Aj = J(Tj < Cj),Xj,z = l,...,n}, where 
/(•) is the indicator function. Let r be a constant denoting the end of the study. We assume 
that Ti and C, are independent given Xj. We also assume that P(Cj > r|Xj) = P(Cj = 
r\Xi) >0. 

To incorporate short-term and long-term covariate effects 
discussed the following semiparametric hazard rate model 

(/3+ 7 ) T X l 

where A(t|X/) is the hazard function of the event time Tj given Xj, X(t) is the baseline 
hazard function, S(t) = exp{— J Q * X(s)ds} is the baseline survival function, F(t) = 1 — S(t) 
is the baseline cumulative distribution function, and (3 and 7 are two vectors of unknown 
regression parameters. The baseline cumulative hazard function A(£) = f* \(s)ds is left 
unspecified. Under this model, the hazard ratios between two sets of covariate values are 
allowed to be non-constant over time. Particularly, we can show that 



A(*|X0 «r (Xl _ x > ls A(t|X 



lim 'ilH^ = eft*-**), hm = e ^~^\ 

t->o A(t|X 2 ) t^r A(t|X 2 ) 

assuming the existence of the limits, where r = sup{t : S(t) > 0}. Therefore, the parameters 
e 13 and e 7 can be interpreted as the short-term and long-term hazard ratios, respectively. 
Moreover, model ([3]) includes the proportional hazards and proportional odds models as 
two sub-models, with (3 = 7 for the proportional hazards model (JTJ), and 7 = for the 
proportional odds model (J2J). 

We extend model ([3]) to allow time-dependent covariates. Let Xj(-) be a p x 1 vector of 



(possibly time- dependent) covariates. Also let Xj(t) denote the history of Xj(-) over [0, £]. 
We assume that the time dependent covariates are external and that Xj(-) are bounded 
right-continuous functions with bounded right derivatives in [0, r] with probability one. We 
specify that the cumulative hazard function conditional on Xj(t) takes the form 

_ r* e (/3+7) T X 1 ( S ) 

where A(t), S(t), F(t),/3, and 7 have the same interpretation as those under model (j3J). 

Our goal is to make inference about parameters = (f3, 7) and the function A(i). Under 
the assumption of conditional independent censoring, the likelihood for (0, A) takes the form 



n 



-ACniXiO^)) 



L e /3 J x i (y i ) j p(y.) + ^x^O^y.) j 

where A'(i) is the first derivative of A(t). 

In order to estimate the unknown parameters, we need to maximize the observed-data 
likelihood. However, this maximum does not exist because one can always choose A'(Yj) = 00 
for some with A, = 1. Thus, we take a nonparametric maximum likelihood approach, 
in which A is allowed to be a right-continuous function. Specifically, we replace A'(li) 
with A{Yi}, the jump size of A(-) at Y~j. Therefore, we obtain the following nonparametric 
likelihood function 



L n (6,A) = \ 



i=l 



p(/3-Hy)-'X i (yi) a ryi 

„-A(y|X i (y))_ 



_ e ^x i (i-) F (y.) + e 7 T x l( y) > s(y.)^ 

We maximize the nonparametric log-likelihood function l n (4>) = logL n (0). The resultant 
nonparametric maximum likelihood estimators (NPMLEs) are denoted by (9 n , A n ). It is easy 
to show that A n must be a step function with positive jumps only at the Y^s for which Aj = 1. 
We order the distinct observed failure time as (V(i), ...,Y( m )), where m is the total number 
of distinct observed failure times. Therefore, the above maximization should be performed 
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over the parameters 6 and these positive jumps. The cumulative hazard function A(t|X;(t)) 
in (El) takes the form 



k:Y (k) <t 



3 ( / 3+7) T X i (V { , ) ) 



e P T ^(Y( k) ) F (Y {k) ) + e t T ^ Y ^S(Y {k) ) 



-A{Y(*)}. 



of 



To comput e the N PMLEs, we use the quasi-Newton algorithm described in Chapter 10 



Press et al. 



(119921 ). Specifically, we use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) 



method, which is one of th e most efficient m ethod for solvin g : nonlinea r optim izati on prob 



l ems, and was proposed by 



Broyden 



(11970h 



Fletcherl (11970l ) 



Goldfarb! (Il970h . and 



Shanno 



(1 19701 ) individually. The BFGS method and its variants have been implemented in standard 
software such as SAS, R, and Matlab and have been successfully used in literature. To ensure 
the stability of the quasi-Newton algorithm, we suggest to center covariates at their means. 
When we constrain the regression parameters such that (3 = 7, the quasi-Newton algorithm 
yields the exactly the same parameter estimates as those from the procedure phreg in SAS 
software and R routine coxph under the proportional hazards model; when we constrain 
7 = 0, the NPMLEs obtained from the quasi-Newton algorithm are the same as those from 
R routine nltm under the proportional odds model. These results provide an empirical 
validation of the quasi-Newton algorithm. 

In the next section, we will establish consistency and asymptotic normality of the NPM- 
LEs. We will show that the asymptotic covariance matrix for n attains the semiparametric 
efficiency bound and can be consistently estimated using the inverse of the observed Fisher 



information matrix for al 
following the argument of 



parameters including and the 



Murphy and van der Vaart 



jump sizes of A n . Alternatively, 



(120001 ). we can estimate the covariance 



matrix of n by using the profile likelihood function for 6, which is defined as the maximum 
likelihood of L n (0, A) for any fixed 0. Our simulation studies indicated that both approaches 
work very well in practical situations. 
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The formulation of the semiparametric hazard rate model provides an appealing diagnos- 
tic tool for testing the proportional hazards and proportional odds models since the latter 
two models are embedded in the former. Specifically, we can check the proportional hazards 
and proportional odds assumptions by testing H : {3 = 7 and H : 7 = 0, respectively This 
can be done by the Wald, score or likelihood ratio statistics. 

3 Asymptotic Properties 

Let 6q = (/3 ,7 ) and A denote the true values of 6 and A. We impose the following 
regularity conditions: 

(CI) With probability one, the covariates X; possess bounded total variation in [0,r] and 
the support of Xj contains 0. In addition, if there exists a function c (t) and a constant 
vector Ci such that 

c?X(t) = c (t),W G [0,r] 

with probability one, then Co(t) = and Ci = 0. 

(C2) Conditional on Xj, the censoring time Cj is independent of the failure time T t . 

(C3) There exists some positive constant number 5 such that P(C; > r|Xj) = P(Ci = 
r |Xi) > 5q almost surely, where r is a constant denoting the end of the study. 

(C4) The true parameter value of 6, 6q, belongs to a known compact set £>o in R 2p . 

(C5) The true baseline cumulative distribution function Ao belongs to the following class 
*4o ={A : A is a strictly increasing function in [0, r] and is continuously different iable 
with A(0) = 0, A'(0) > and A(r) < 00}. 

All the above assumptions are standard in the semiparametric analysis of failure time 
data. Under these assumptions, we first show that the NPMLEs (0 n ,A n ) exist. It suffices 
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to show that the jump size of A n at Yi for which Aj = 1 is finite. By the compactness of 6, 
F, S, and X i; z = l,...,n, we have 

n 

^„(^A)<n c i A ra Aie_c2A{yi} 

i=l 

for some positive constants c\ and c<i- Thus, if for some i such that Aj = 1 and A{1^} — > oo, 
L n (0, A) —7- 0. We conclude that the jump sizes of A n must be finite. On the other hand, 9 
belongs to a compact set Bq. It follows that the NPMLEs exist. 

We next establish identifiability of the model parameters (0, A). 

Lemma 1. Under conditions (CI) - (C5), the parameters 6 and A are identifiable. 

The proof of Lemma 1 is given in Appendix A.l. Using Lemma 1, we are able to obtain 

the following consistency results. 

Theorem 1. Under conditions (C1)-(C5), \\6 n - O || and sup \A n (t) - A (t)\ -> 

te[o,r] 

almost surely, where 1 1 • 1 1 is the Euclidean norm. 

Remark 1. Theorem 1 states the consistency of the NPMLEs. The basic idea to prove 
Theorem 1 is as follows. As in the proof of the existence of the NPMLEs, we will show that 
A n (r) is not allowed to diverge. Once the boundedness of A n (r) is established, a subsequence 
of A n can be found to converge pointwise to a bounded monotone function A* in [0, r] and the 
same subsequence of n converges to some 6*. We construct a step function A n with jumps 
at the observed failure times converging to A . Then, because L n (6 ni A n ) > L n (Oo, A n ), 
by taking the limit, we will prove that the Kullback-Leibler information between the true 
density and the density indexed by (0*,A*) is non-positive. Therefore, the true density 
must be equal to the density indexed by (8*, A*). The consistency will then follow from the 
identifiability result. The detail of the proof is given in Appendix A. 2. 

Our last theorem establishes the asymptotic properties of the NPMLEs. 
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Theorem 2. Under conditions (C1)-(C5), the random element \/n{6 — 0o,A n — Ao) 
converges weakly to a zero mean Gaussian process in the metric space l°°(l-C), where 

U = {(hi,h 2 ,/i2) : h x G i? p ,h 2 G /i 3 is a function on [0,r]; ||hi|| < 1, ||h 2 || < 1, \h 3 \ v < 1} 

and | ^3 1 v denotes the total variation of h% in [0, r]. Furthermore, n is asymptotically 
efficient. 

Remark 2. In the statement of Theorem 2, asymptotically efficient estimators mean that 
the a symptotic covariances attain the semiparametric efficiency bounds as defined in Bickel 



et al. (11993. Ch. 3l ). Once the consistency of the NPMLEs is established, the asymptotic 



distribution of the NPMLEs stated in Theorem 2 can b e deriv ed by verifying the four con- 



van der Vaart and Wellnerl (119961 ). The proof of Theorem 2 is 



ditions in Theorem 3.3.1 of 
given in Appendix A. 3. 

Remark 3. Theorem 2 implies that for any (hi, h 2 , h 3 ) G "H, \/n(j3 n — /3 ) T hi + y/ri(~f n — 
7 ) T h 2 + y/n Jq h 3 (t)d(A n — A ) is asymptotically normal with mean zero and variance 
Var(^ r [hi, h 2 , /13]), and this normal approximation is uniform in (hi,h 2 ,/i3), where \& G 
00 ("H) is the random element in the limiting distribution. Therefore, to estimate the variance 
of (/3 n ,7 n , A n ), we view (jSJ) as a parametric likelihood with /3, 7, and the jump sizes of A 
at the observed failure times as parameters. We can then estimate the asymptotic variance 
matrix of the unknown parameters by inverting the observed information matrix according 
to the parametric likelihood theory. 

4 Simulation Studies 

We conducted extensive simulation studies to evaluate the finite sample performance of the 
proposed methodology using 1000 replicates. We generated failure times from the following 
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model 

A(i|X "» = L e«.F W + ^.S( S ) ' iA(s) - 
where is a uniform(— 1, 1) variable. The baseline cumulative hazard function is set to be 
A(t) = t. We consider four scenarios for the values of regression parameters: (a) (/?, 7) = 
(-0.5,0.5); (b) (/3, 7 ) = (-0.5,0); (c) (/3, 7 ) = (0,0.5); and (d) (/3, 7 ) = (0.5,0.5). Under 
scenario (a), the short-term and long-term hazard ratios are on opposite directions; under 
scenario (b), the long-term hazard ratio is 1 corresponding to a true proportional odds model; 
under scenario (c), the short-term hazard ratio is 1; and under scenario (d), the short-term 
and long-term hazard ratios are equal corresponding to a true proportional hazards model. 
The censoring time is set to be the minimum of 2 and a uniform(0, 4) variable, producing 
approximately 29 % censoring under all four scenarios. We used the quasi-Newton algorithm 



(IPress et al. 



19921 ) to calculate the NPMLEs. There is little difference between the standard 
error estimates through the Fisher information matrix and those from the profile likelihood 
approach. We present the standard error estimates based on the observed Fisher information 
matrix throughout the simulation studies and real data applications. 

Table 1 summarizes the results for 7, and A(t) with n = 100 and n = 200. For the 
nonparametric estimation of A(t), we evaluated its estimates at t = 0.5 and t = 1.0. For 
comparison, we also fit the proportional hazards and proportional odds models, for which 
the regression parameters were denoted as (3ph and fipo, respectively. The results in Table 
1 indicate that the proposed method performs well for small sample sizes. In particular, the 
proposed estimators appear to be unbiased. The standard error estimator reflects accurately 
the true variation, and the confidence intervals have proper coverage probabilities. When 
the proportional hazards assumption is violated, the Cox model leads to biased estimates. 
Particularly, the results based on the Cox model can be very misleading when the short-term 
and long-term covariate effects are in opposite directions. Similar results were observed for 
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the proportional odds model when the model assumption is not true. When the Cox model 
or the proportional odds model holds, as expected, the proposed NPMLEs are less efficient 
than those obtained under the true sub-model. 

Our next set of studies evaluated the proposed inference procedures for the testing of 
covariate effects and the assumptions of proportional hazards and proportional odds. Specif- 
ically, we considered Wald tests for the following null hypotheses: (HI) Hq : (3 = 0; (H2) 
# :7 = 0; (H3) H : (3 = 7 = 0; and (H4) H : (3 = 7. Note that testing the long- 
term hazard ratio is equivalent to testing the proportional odds model. For comparison, we 
also considered the testing of covariate effects under the proportional hazards model: (H5) 
Hq : (3pH = 0. We used the same simulation setting as above with n = 200. Table 2 presents 
the sizes/powers of the Wald tests at the nominal levels of 0.05. In all cases, the proposed 
tests have accurate control of type I error rates and reasonable powers under the alternative. 
The proposed tests of short-term, long-term and overall covariate effects tend to be more 
powerful than the Cox model when the proportional hazards assumption is violated. When 
our interest is to test the short-term or long-term hazard ratio only, the Cox model tends to 
yield inflated type I error rates under model mis-specifications. 

We carried out additional simulation studies to compare the efficiency of the proposed 
NPMLEs rela tive to the pseudo-maximu m likelihood estimators for two-sample data as im- 



plemented by lYang and Prentice! (120051 ). We considered the same simulation settings as 
above except that X, is a binary variable taking values -0.5 and 0.5 with equal probabili- 
ties. Table 3 presents the empirical mean squared errors for estimating /3 and 7 based on 
1,000 repetitions. As expected, under almost all situations the proposed estimators are more 
efficient than the pseudo-maximum likelihood estimators. 
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5 Real Data Examples 
5.1 COGA study 

In the COGA study mentioned previously, 643 individuals were affected with alcoholism 
and 971 individuals were disease-free at the time of interview. After excluding individuals 
with missing genotype at the target gene locus or phenotype data, the final data set for our 
analysis consisted of 1,371 individuals, including 626 affected individuals and 745 unaffected 
individuals. 

Preliminary analysis revealed that gender was a risk factor for alcoholism; males were at 
a higher risk than females. Of the 626 affected individuals, 424 were males, as opposed to 



229 males in the unaffected individu als. Previous linkage analysis showed a linked region on 



chromosome 14 flPalmer et al. 



of ordinal traits ( IWang et al. 



1999]) - Two recent studies on the genetic association analysis 



2006; 



Diao and Lin 



201(1 ) suggested that SNP rsl972373 on 



chromosome 14 might be a disease susceptibility locus. Based on the Kaplan- Meier estimates 
of survival curves for the three genotype groups at SNP rsl972373 presented in Figure 2, 
allele '2' appeared to have little short-term impact but strong long-term impact on the risk 
of alcoholism. 

In our analysis, we fit the proposed model (J4]) and included gender and genotype score at 
SNP rsl 972373 as covariates. The gender of an individual was coded as 1 for male and for 
female, and the genotype score was coded as the numbers of allele type '2'. Both covariates 
were then centered at their means. The tests of the proportional hazards assumption for 
gender and genotype score at SNP rsl972373 were significant with p-values of 0.016 and 
0.027. Gender appeared to have significant short-term and long-term effects on the age-at- 
onset of alcoholism. The short-term and long-term log-hazard ratios of male versus female 
are estimated at 0.866 and 1.9932 with standard error estimates of 0.147 and 0.367, both 
leading to p-values less than 0.0001. As expected, SNP rsl972373 appeared to have no 
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short-term effect but significant long-term effect on the age-at-onset of alcoholism. The 
short-term log-hazard ratio of allele type '2' versus allele type '1' is estimated at -0.06 with 
a p-value of 0.479 whereas the long-term log-hazard ratio is estimated at 0.683 with a p- 
value of 0.015. One copy of allele type '2' in the genotype at SNP rsl972373 is expected 
to increase the long-term hazard of alcoholism by 98% with a 95% confidence interval of 
(14%, 243%). Figure 1 plots the separate Kaplan-Meier and the model-fitted survival curves 
for each genotype group. The model-fitted survival function is calculated as the empirical 
average of the predicted survival functions. That the predicted survival functions agree well 
with the nonparametric Kaplan- Meier estimates of the survival curves indicates a good fit of 
the model. In contrast, the Cox model failed to detect the long-term effect of SNP rsl972373. 
The log-hazard ratio estimated from the Cox model is 0.083 with a standard error estimate 
of 0.058, corresponding to a p-value of 0.153. 

5.2 Gastrointestinal tumor study 

As mentioned in the Introduction section, the gastrointestinal tumor study compared chemother- 
apy with the combined chemotherapy and radiotherapy on the treatment of locally unre- 
sectable gastric cancer. There were 45 patients randomly assigned to each treatment arm. 
Two observations were censored in the chemotherapy group and six were censored in the 
combined therapy group. Under the two-sample proportional hazards model, the log-hazard 
ratio of chemotherapy versus the combined therapy is estimated at 0.106 with a standard 
error estimate of 0.223, yielding a p-value of 0.635. The use of proportional hazards model 
failed to capture the phenomenon of crossing survival curves shown in Figure 1 and the 
results were meaningless in this situation. 

We fit the proposed model (jlj) by letting Xi = 0.5 for the combined therapy group and 
Xi = —0.5 for the chemotherapy group. The test of the proportional hazards assumption 
is highly significant with a p-value of 6.0 x 10~ 4 . The new method successfully captured 
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the phenomenon of crossing hazards. The short-term log-hazard ratio (3 and long-term log- 
hazard ratio 7 are on opposite directions and estimated at 1.76 and -1.59 with standard error 
estimates of 0.582 and 0.509, leading to p- values of 0.0025 and 0.0018, respectively. The 95% 
confidence intervals are (0.62, 2.90) for /3 and (—2.59, —0.59) for 7. The estimated short-term 
and long-term hazard ratios are 5.81 and 0.20 with 95% confidence intervals (1.86, 18.17) and 
(0.075,0.553). As evident in Figure 2, the model fitted survival curves agree well with the 
nonparametric Kaplan-Meier survival estimates very well indicating a g ood model fit. Our 



result s are also consistent with the results from the two-sample model of 



Yang and Prentice 



( 120051 ) using the pseudo maximum likelihood approach. 



6 Discussion 



We ha ve extended the two-sample semiparametric hazard rate model of 



Yang and Prentice 



( 120051 ) to incorporate short-term and long-term effects of potentially time-dependent covari- 
ates. We have studied the nonparametric maximum likelihood estimation for the proposed 
model fl3j) and established the asymptotic properties for the NPMLEs. Unlike the existing 
varying-coefficient Cox model, the estimation and inference procedures are likelihood-based 
and statistically efficient. Numerical studies and the applications to the Gastrointestinal tu- 
mor study and the COGA study demonstrate that the proposed inference procedures perform 
well in practical situations. 

We have implemented the new method in C language using the quasi-Newton algorithm 



described in 



Press et al 



( 119921 ). The convergence of the quasi-Newton algorithm is very fast 
and it takes less than 0.2 second to analyze one data set with 400 subjects on a Dell Pow- 
erEdge 2900 server. The efficiency of our computer program makes it feasible to apply our 
method to gene expression data and genome- wide association studies. Our user-friendly com- 
puter program is freely available on the website: http:/ /mason, gmu.edu/~gdiao/ soft ware/. 
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For the purpose of illustration, we assume that observations in the COGA study are 
independent. Although the failure times within the same family tend to be correlated, the 
NPMLEs 6 n can be shown to be consistent for 8 and asymptotically normally distributed 
provided that the marginal model is corrected specified. However, the naive covariance 
matrix estimator for n using the inverse of the observed Fisher information matrix, is 
no longer valid in the presence of within-family dependence. To account for within-family 
correlations, one option is to fit marginal models and then use the robust sandwich estimators 
of covariance matrix. For the COGA data, the naive and robust covariance estimates were 
very close suggesting weak within-family correlations. Currently we are investigating the 
extensions of the semiparametric hazard rate model (j4j) to correlated failure time data by 
using random effects. 

To assess the adequacy of the semiparametric hazard rate model fl4]), we can develop a 
goodness-of-fit procedure based on martingale residuals. The martingale under model (jlj) 
can be written as 



where Ni(t) and Yi(t) are the usual counting process and at risk process. The score process 
for 6 seen as a function of time can be expressed as functions of martingale residuals, 



Mi(t) = Ni(t) - [ Y t (s) 



e (/3+ 7 ) T X l (s) 



dA(s), 




where 




and 




e t T xds)s( s ) 



^X l ( S ) j P( s ) + e 7 T X l ( S ) 5 ( s )- 
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Under mo del (HI), U 
the line of Lin et al. 



t \ d n , A n) are expected to fluctuate randomly around 0. Therefore along 



( 119931 ) . we can construct an alternative goodness-of-fit test for the jth 



covariate based on the test statistic 



Kj= sup Uj(t;3 n ,An)Cov '{Uj^; 3 n , A n ,)}U,(t; 3 n , An), 3 = h 
te[8,T-5\ 



■P- 



where 5 is a small positive number to avoid nume rical prob l ems a t the edges, and U .,•(•) is 



Lin et al. 



(119931 ). the null distribution of 



the score process for the jth covariate. Similar to 
the above test statistic can be evaluated using a resampling approach and the p-value may 
be approximated by the empirical proportions of the realizations of the null distribution 
exceeding Kj. The theoretical justification of this procedure, however, is challenging since 
the partial likelihood function is not available under model (J4]). We are currently investigating 
this type of goodness-of-fit procedures for general semiparametric survival models including 
model P}. 

To accommodate time- varying covariate effects on survival outcomes, one can also extend 
the Cox model ([I]) through the use of time-varying regression coefficients such that 

\(t\X) = A(t)e /3T W X , 



where (3(t) is a p x 1 vector of unspecified functions of t. Estimation and inference procedures 
for this so-called varying-coeffic i ent Cox model have been investiga t ed by several authors, in- 



cludin g 



(120021 ^ 



Zucker and Karri ( 



1990h 



Winnett and Sasienil (120031). 



Murphy and Sen! (119911) 



Cai and Sun 



(2003). 



Murphvl ( 



19931} 



Tian et al 



Martinussen et al. 



(|2005l). and Peng and 



Huang (120071 ). among others. In general, nonparametric smoothing is required to estimate 
the time varying coefficients. Note that for the case when X is a one- dimensional binary 
covariate, as for the two arm clinical trials, the time-varying regression coefficient model is 



completely nonparametric and specify any relationship between the two samples. For the 
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general fc-dimensional covariates, though, it may be interesting to compare the performance 
of the proposed method with that of the methods based on the varying-coefficient Cox model. 
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We introduce some notations that will be used throughout the appendix. Let O, denote 
the observations for the ith subject consisting of (1$, A i: Xj). Let P n and P be the empirical 
measure and the expectation of n i.i.d. observations Oi, O n . That is, for any measurable 
function g(0), 



APPENDIX 



Pnfo(O)] = -5>(O0, P[g(0)] = E\g(0)]. 



i=l 



A.l. Proof of Lemma 1. Suppose that two sets of parameters, (0, A) and (0, A), give 



the same likelihood function for the observed data, i.e., 



e (/3+ 7 ) T X(y) A /(y) 



A(Y|X(r)) 



e ^{Y) F{ y) + e ~i T ^ Y )S(Y) 
e G3+7) T x(y)^'(y) 



e 



(6) 



e 



A(Y|X(y)) 



_ e /3 T x ( Y) F (y) + e i T x{Y) S (Y) 
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where S(t) = e" A W, F(t) = 1 - S(t), and A(t|X(t)) = f* ~ T ~ T - rfA ( g )- Let 
A = 1 and K = 0, we obtain 



(/3-/3fX(0) = log^ 0) 



A'(O)" 

Then, condition (CI) gives (3 = (3 and A'(0) = A'(0). Because the equality fl6]) holds for any 
X, by letting X(s) = 0,s 6 [0, r] and A = 0, we obtain A(y) = A(y). Finally, by choosing 
A = and Y — y and taking the logarithm and then the first derivative with respect to y 
in OH]), we obtain 



e (/3+ 7 ) T X(j/) A /^N e (/3+7) T X(y) A /(^ 



Again condition (CI) gives 7 = 7. The identifiability of the parameters {6, A) is established. 

A. 2. Proof of Theorem 1. The proof of consistency consists of two major steps. In the 
first step, we prove that A n (t) has an upper bound in [0,r] with probability one. Therefore 
there exists a subsequence of (0 n ,A n ) that converges to (0*,A*). In the second step, we 
prove that 6* = 6 and A* = A . 

Step 1. We will prove the boundedness of A n (r) by contradiction. Recall that nonpara- 
metric log-likelihood takes the form 

l n (f3, A) = nP n [R(Q; 0, A) + A log A{Y}}, 



where 



R(0; 6, A) =A \((3 + j) T X(Y) - log ^ T ^ Y) F(Y) + e^ x(y) 5(F)} 
i-y e (/3 +1 ) T My) 
~ Jo e^v)F{y) + ei T ^y)S{y) dK{ * V) ' 

Define £ n = A n (r) and A n (y) = A n (y)/£ n . It is obvious that £ n maximizes the function 
ln(9n,£A n )/n. To prove A n in [0,r] is bounded, it is sufficient to prove £ n is bounded. It is 
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easy to see that 



< — /n(0n,£nA„) l n (0 n , A n ) 

n n 
Alog{„ - A log- 



Y 



,(/3n+7„) T X(t) 



e Knt)F n (t) + e^ x WS n (t) 



| 



C?An(t) 



where (F n , S n ) and (_F n , S n ) are the distribution function and survival function corresponding 
to A n and A n , respectively. 

By conditions (CI) and (C4), we can show that 



-A log 



e XMY)p n{ Y) + e ^ Y )S n (Y) 
e £x(Y)F n (Y) + e^ Y )S n (Y) 



< 



9i, 



where g\ is a constant. Suppose that £ n — > oo. According to conditions (CI) and (C4), we 
have 



Y 



,(/3 n +7„) T X(i) 



< ~92Cn + 93 



'o i.e^ x (*)F n (t) + e^ x WS„(t) 

for some positive constants gi and g%. 

It follows that 

< -l n (0n,£rAn) ~ -l n (0 n ,A n ) < log £ n - # 2 £n + #3 -> "OO 

n n 

as £n ~~ °°- This contradicts to the definition of (0 n ,A n ). Note that the above argument 
hold for every sample in the probability space except a set with zero probability. Therefore 
we have shown that, with probability one, A n (r) is bounded for any sample size n. 

Thus, by Helly's selection theorem, we can choose a further subsequence, still indexed by 
{n}, such that n — > 0* and A n weakly converges to A* with probability one. 
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Step 2. In this step, we will show that 6* = do and A* = Ao- By differentiating l n (0, A) 
with respect to A{Yi} and setting it be zero, we can see that A n {Yi} satisfies the following 
equation. 



A n {Y t ] 



where 



nP n [J(y>2/)g(2/,O;0 n ,A n )] 



(7) 



y=Yi 



3 (/3+ 7 ) T X(j/) 



e p' *{Y)F(Y) + ei T ^ Y )S(Y) e? 1 *MF(y) + ef T ^S(y) 
" Y e(/3+7) r x W>s(s){e /3-x( s ) _ e^ x W} 



Q(y,O;0,A) 



l y | e /3 r x( s)F ( s ) + e7 ^x( s ) 5 ( s )y 

In view of ([7]), we construct another step function A n (t) with jumps only at the observed 
Yi and the jump size satisfies that 



K{Yi\ 



y=Y % 



riP n [I(Y >y)Q(y,O;d ,A )} 

We verify that A n (t) converges to A uniformly in t G [0, t] with probability one. In Appendix 
A. 4, we prove that the class 

Ti = {I(Y > y)Q(y, O, 6, A) : y G [0, r], G B , A G A, A(0) = 0} 

is a bounded and P-Donsker class, where A = {g : g is a nondecreasing function in [0, r], g(r) < 
i?o} and -Bo is a positive constant such that A n (r) < B with probability one. Since 
a P-Donsker class is also a Gliy enko-Cantelli class, by the Glivenko-Cantelli theorem in 



van der Vaart and Wellnerl (119961 ). A n (t) uniformly converges to E[I(Y < t)A/fi(Y)], where 
fji(y) = E[I{Y >y)Q(y,O;0 ,A )}. 

Denoting by iScr(-|X) the survival function of the censoring time C given X, we have 

e (/3o+7o) T X(y)-Ao(j/|X(j/)) iS ' c ^|x(y))" 



rty) = E 



e ^My)F {y) + e^(y)S (y) 
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where Ao(-|X) is the true cumulative hazard function of T given X, Fq is the true baseline 
distribution function and Sq is the true baseline survival function. Therefore, 



E 



I(Y < t)A 
W) 



E 



e (Po+io) T My)-^o(y\My))s c ( y \X(y)) 
o M{e^ x{y) F (y) + eioMy) So (y)} 

dA (y) = A (t). 



dMv) 



Consequently, we conclude that A n uniformly converges to A in [0, r] with probability one. 

By the construction of A n (t) and A n (t), we can see that A n (t) is absolutely continuous 
with respect to A n (t) and 

«PJ/<r >»)«<», O;*, A.)] ^ 



AJt) 

lo P n [I(Y>y)Q(y,O;0 n ,A n )} 
By taking limits on both sides of (jHJ), we obtain that 



A*(t) 



P[I(Y>y)Q(y,O;e ,A )} 
, P[I{Y>y)Q{y,O;0*,A*)} 



dA (y). 



Therefore, A*(t) is different iable with respect to A (t) so that A*(t) is differentiable with 
respect to t. It follows that dA n (t) / dA n (t) converges to dA* (t) / dA (t) uniformly in t G [0, r]. 
Note that 



n l n (d n ,A n ) 



> 0. 



A log 



n 1 l n (6 ,A n ) 
A n {Y} 



K{Y} 



P n [R(O;0 n ,A n )-R(O;6 ,A n )} 



(9) 



Since Bq x A is a Donsker class and the functionals R(0; 0,A) are bounded Lipschitz func- 
tional with respect to Bq x A, by the same arguments as in the proof of Donsker class for 
J-i, the following class 



jr 2 = {^(O; 0, A) : E Bq, A E A, A(0) = 0, A(r) < B } 
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is P-Donsker and hence a Glivenko-Cantelli class. Therefore by letting n — > oo in fl9]), we 
have 

Wy\AgH(O;0*,A*) } -| 



< P 



log 



\ (Y) A e R (°-> e »> A o) 

which is the negative Kullback-Leibler information. Then it follows that, with probability 
one, 

A * (y) A e i?.(0;<r,A*) = Ao(y) A e «(0;6> ,A ). 

Therefore, from the identifiability result proved earlier, we obtain 0* = 0q and A* = Ao- 
This completes the proof of Theorem 1. 



A. 3. Proof of Theorem 2. We prove 



in Theorem 3.3.1 of 



heorem 2 by verifying the four conditions 



van der Vaart and Wellnerl (119961 ). For this purpose, we first define a 



neighborhood of the true parameters (0q, A ), denoted by 



W = {(0,A): ||0-0 O || + sup |A(t)-Ao(t)| <eo}, 

te[o,r] 

for a very small constant eo- Based on the consistency theorem, (6 n ,A n ) belongs to U with 
probability close to 1 when the sample size n is large enough. 

For any one-dimensional submodel given as {f3 + eh 1; 7 + eh 2 , A + e J h 3 dA}, (0, A) G 
U, H = (hi, ri2, hz) e ~H, we can derive the score function for a single observation O 

i? 2 (y,O;0,A)[H]" 



W(O;0,A)[H] =A 



Y \e^ T ^{(h 1 + h 2 ) T X(Y) + h 3 } 



dA, 



Ri(y,O;0,A) 
e^ + ^ T ^R 2 {y,O;0,A)[H] 
Rl(y,O;0,A) 



(10) 
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where R^y, O, 0, A) = e^ T ^F(y) + e^ x ^S{y) and 

i? 2 (y,O;0,A)[H] = e ^^ (^F(y)h[ X(j/) + S(y) J\ 3 dA 
+ e^ x ^5(y) (h?X(y) - J\ 3 dAy 

We define 



and 



tf n (0,A)[H] = P n {W(O;0,A)[H]} 



tf(0,A)[H] = P{W(O;0,A)[H]}. 



Thus, it is easy to see that U n (0, A)[H] and £7(0, A)[H] are both maps from U to l°°(l-C) and 
y/n{U n (0, A) — U(0,A)} is an empirical process in the space l°°(H). It is easy to see that 
U n (6 n , A n ) = and f/(0 o ,A o ) = O. 

We shall prove the theorem by v erifyin g the following four properties stated in Theorem 



3.3.1 of 



van der Vaart and Wellnerl ( 119961 ) 



(PI) ^L(U n -U)(0 n ,A n )-^L(U n -U)(0 o ,A o ) = o P (l + v ^||0 n -6»o|| + v / ^sup, e[OiT] \A n {y)- 
Ao(2/)|). 

(P2) \/n{U n — U)(0 ,A Q ) converges to a tight random element 
(P3) f/(0, A) is Frechet-differentiable at (0 O ,A O ). 

(P4) The derivative of £7(0, A) at (0 O , A ), denoted by U'(8o, A ) is continuously invertible. 



van der Vaart and Wellnerl ( 119961 ) 



To prove property (PI), we make use of Lemma 3.3.5 of 
Based on the explicit expression in ([TO]) . W(0; 6, A) [H] is continuously differentiable with 
respect to and 

dW(O:0.A) 

< #4, 



dO 
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where g± is a positive constant. Furthermore, 



\W(Q; 0, Ai)[H] - W(0; 0, A 2 )[H]| < g^A^Y) - A 2 {Y)\ + jT {A^y) - A 



-Mv)\dy 



for some positive constant g$. Therefore, 



sup E 



{W(O;d,A)[B]-W(O;0 o ,A o )[B]y 



converges to zero if \\0 — 6 \ \ + sup ye r 0r ] \A(y) — A (y)\ — > 0. In addition, by the same 
arguments as in the proof of Donsker class for J 7 !, the class 

^3 = {W^(0; 0, A) [H] — W(0; 6 Q , A ) [H] : (0, A)eW,He K} 



van der Vaart and Wellnerl (119961 ) 



is P-Donsker. Therefore, according to Lemma 3.3.5 of 
property (PI) holds. 

Property (P2) holds again because of the P-Donsker property of the class 



{W(O;0 O ,A O )[H] :HGH}. 

Furthermore, the limit random elements £ is a Gaussian process indexed by H e % and the 
covariance between £(Hi) and £(H 2 ) is equal to 



E 



W^O^AoMHi] x ^(O;0 O ,A O )[H 2 ] 



The Frechet differentiability in (P3) can be directly verified by using the smoothness of 
U (0, A). The derivative of U (0, A) at (0 O , A ), denoted by U f (0 , A ) is a map from the space 

{(0-0 o ,A-A o ):(0,A)gW} 



to l°°(H). 
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It remains to sh ow that U' is c ontinu ously invertible at (Oq,Aq). Follow the argument 



in the Appendix of IZeng and Linl (j2007l ). it suffices to prove that for any one- dimensional 
submodel given as {/3 + eh 1; 7 + eh 2 , A + e f h 3 dA Q }, H G "H, the Fisher information along 
this submodel is nonsingular. If the Fisher information along this submodel is singular, 
the score function along this submodel is zero with probability one. We will show that 
W(O;0 O , Ao)[H] = yields that hi = 0, h 2 = 0, and h 3 = 0. We follow the ideas of 
proving the identifiability in the proof of Theorem 1. Let A = 1 and Y = 0, we obtain 
hfX(O) + h 3 (0) = 0. Conditions (CI) gives h x = and h 3 (0) = 0. Let A = and 
X(s) = 0,s 6 [0 , t] , we obtain J Q l h 3 dA = for any t G [0,r]. Similarly, let A = 1 and 
X(s) = 0, s G [0, t], we obtain h 3 (t) + J* h 3 dA = 0. Therefore, h 3 (t) = for any t G [0, r]. 
Let A = and Y = y and then take the first derivative with respect to y in W(0; #o? Ao)[H], 
we obtain 



hfX(y)- 



e^ x ^F (|/) 







Rx(y,O;0 ,A 

for any y G [0, r]. Immediately, we have h 2 = 0. We have thus proved nonsingularity of the 
Fisher information matrix along any nontrivial submodel. Henc e, property (P4) holds. 



We now have verified properties (P1)-(P4), Theorem 3.3.1 of 



van der Vaart and Wellner 



( 1996 ) concludes that \/n(6 n — (3 , A n — Aq) weakly converges to a tight Gaussian random 
element — U'" 1 ^ in Moreover, it can be shown that 8 n is an asymptotic linear esti- 

mator for £?o and that the corresponding influence functions are on the space spanned by the 



score functions. 



efficiency theory (IBickel et al 



his implies that 6 n is se miparametrically efficient by the semiparametric 



1993. Ch. 3f l. 



A. 4. Donsker Property of T\. In this appendix, we prove that the following class 



Ti = {I(Y > y)Q(y, O; 0, A) : y G [0, r], G Bo, A G A, A(0) = 0}, 
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is P-Donsker. To show that T\ is P-Donsker, we first prove that the class 



T = {Q(y, O;0,A):y£ [0, r], G Bo, A G A, A(0) = 0, A(r) < 5 } 



is P-Donsker. Using condition (C2), it is easy to show that Q(y, 0,0, A) is bounded and 
continuously differentiable with respect to for any E Bo and 



dQ(y,O;0,A) 



dO 



< 96, 



where go is a positive constant. In addition, for any Ai and A2 G A there exist a positive 
constant g-j such that 

\Q(y,O;0,A 1 ) -Q(y, O;0,A 2 )\ 



<^ 7 ||A 1 (y)-A 3 (y)| + |A 1 (y)-A 3 (y)| + ^ |A X (0 - A(t)\dtj . 

Therefore, by the mean- value theorem, we can show that for any (y, 0, A) and (y, 0, A) in 
[0, t]xB x A, 

\Q(y, O;0, A) -Q(y, O;0,A)\ 

<^ 8 |||/3-^|| + |A 1 (y)-A 2 (y)| 

+ \A 1 (y)-A 2 (y)\+ f |Ax(t) - A(t)\dt 



holds for a positive constant g 8 . Since [0, t] x B x A and {H(y) : y £ [0, r], H E A, H(0) = 
0,H(t) < Bo} are both Do nsker classes, we conclude t hat J F is P-Donsker according to 



Theorems 2.7.5 and 2.5.6 in 



van der Vaart and Wellnerl (119961 ) and the preservation of the 



Donsker property under the product and the summation. Similarly, since {I{Y > y) : y G 
[0,r]} is P-Donsker, T\ is also P-Donsker. 
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Table 1. Summary statistics for the simulation studies based on 1,000 replications 



n 


Par 


Est 


SE 


SEE 




CP 


Est 


SE 


SEE 


CP 






09,7) = ( 


-0.5,0.5) 




(/3,7) = ( 


-0.5,0.0) 


100 


P 


-0.511 


0.413 


0.412 





.956 


-0.506 


0.407 


0.409 


0.958 




7 


0.465 


0.570 


0.556 





.938 


-0.04 


0.564 


0.568 


0.944 




A(0.5) 


0.508 


0.086 


0.085 





,946 


0.507 


0.085 


0.086 


0.962 




A(1.0) 


1.019 


0.145 


0.146 





,954 


1.019 


0.146 


0.149 


0.959 




Pph 


-0.116 


0.217 


0.209 




- 


-0.317 


0.219 


0.211 


0.926 




Ppo 


-0.294 


0.327 


0.320 




- 


-0.511 


0.326 


0.322 


0.944 


200 


P 


-0.512 


0.291 


0.288 





.954 


-0.507 


0.287 


0.286 


0.955 




7 


0.496 


0.400 


0.389 





,940 


-0.007 


0.401 


0.399 


0.950 




A(0.5) 


0.504 


0.059 


0.059 





,954 


0.504 


0.058 


0.060 


0.953 




A(1.0) 


1.012 


0.104 


0.101 





,947 


1.012 


0.104 


0.104 


0.953 




Pph 


-0.107 


0.153 


0.147 




- 


-0.308 


0.154 


0.148 


_ 




Ppo 


-0.287 


0.231 


0.225 




- 


-0.504 


0.231 


0.226 


0.948 






(P, 7) = 


(0.0,0.5) 






(ft 7) = 


(0.5,0.5) 




100 


P 


-0.012 


0.406 


0.405 





,954 


0.495 


0.414 


0.409 


0.945 




7 


0.490 


0.570 


0.563 





,934 


0.512 


0.587 


0.585 


0.947 




A(0.5) 


0.510 


0.087 


0.085 





,952 


0.509 


0.087 


0.087 


0.954 




A(1.0) 


1.023 


0.146 


0.148 





,958 


1.027 


0.147 


0.151 


0.959 




Pph 


0.188 


0.211 


0.210 




- 


0.499 


0.216 


0.214 


0.952 




Ppo 


0.202 


0.321 


0.319 






0.707 


0.327 


0.325 




200 


p 


-0.009 


0.284 


0.282 





,956 


0.496 


0.287 


0.285 


0.962 




7 


0.501 


0.398 


0.395 





.947 


0.503 


0.410 


0.411 


0.944 




A(0.5) 


0.506 


0.059 


0.060 





,952 


0.505 


0.059 


0.061 


0.957 




A(1.0) 


1.014 


0.104 


0.102 





,944 


1.015 


0.104 


0.105 


0.957 




Pph 


0.193 


0.149 


0.147 






0.498 


0.151 


0.150 


0.946 




Ppo 


0.207 


0.227 


0.225 






0.706 


0.228 


0.229 





Par, the parameter to be estimated; Est, the average estimate; SE, the sample standard deviation 
of the estimates; SEE, the average standard error; CP, the coverage probability of the nominal 95% 
confidence intervals. 
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Table 2. Empirical size/power of the Wald test at significance level of 0.05 based on 1,000 



replications 





7 


T T 1 

HI 


H2 


TTO 

H3 


H4 


tt r 

H5 


0.0 


0.0 


0.040 


0.052 


0.050 





,050 


0.059 


-0.5 


-0.5 


0.434 


0.246 


0.860 





,052 


0.917 


-0.5 


-0.4 


0.439 


0.184 


0.801 





,053 


0.874 


-0.5 


-0.3 


0.428 


0.142 


0.723 





,059 


0.815 


-0.5 


-0.2 


0.429 


0.086 


0.638 





,071 


0.741 


-0.5 


-0.1 


0.437 


0.051 


0.563 





,096 


0.656 


-0.5 


0.0 


0.438 


0.050 


0.499 





,137 


0.544 


-0.5 


0.1 


0.431 


0.062 


0.440 





,166 


0.447 


-0.5 


0.2 


0.437 


0.089 


0.396 





,220 


0.345 


-0.5 


0.3 


0.432 


0.137 


0.372 





,268 


0.254 


-0.5 


0.4 


0.428 


0.189 


0.363 





,328 


0.179 


-0.5 


0.5 


0.433 


0.262 


0.362 





,39o 


0.129 


n a 
-U.4 


n k 
U.o 


n qo/i 
U.OU4 


U.ZOo 


n 9fi7 
U.Zo I 





,341 


n n7/i 
U.U 1 4 


-0.3 


0.5 


0.195 


0.253 


0.217 





,264 


0.063 


-0.2 


0.5 


0.104 


0.266 


0.191 





,228 


0.075 


-0.1 


0.5 


0.055 


0.266 


0.212 





,176 


0.154 


0.0 


0.5 


0.044 


0.263 


0.272 





,139 


0.265 


0.1 


0.5 


0.056 


0.261 


0.359 





,109 


0.400 


0.2 


0.5 


0.099 


0.245 


0.476 





,086 


0.563 


0.3 


0.5 


0.174 


0.248 


0.632 





,065 


0.718 


0.4 


0.5 


0.308 


0.231 


0.742 





,057 


0.840 


0.5 


0.5 


0.417 


0.223 


0.851 





,047 


0.911 
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Table 3. Mean squared errors of the pro posed NPMLEs and the p seudo maximum 
likelihood estimators (PMLEs) of lYang and Prentice! ( 120051 ) for (/?, 7) 



PMLE 



NPMLE 



PMLE/NPMLE 



n 


09,7) 


P 


7 


P 


7 


P 


7 


100 


(-0.5,0.5) 


0.090 


0.108 


0.073 


0.111 


1.242 


0.978 




(-0.5,0.0) 


0.085 


0.114 


0.061 


0.105 


1.390 


1.084 




(0.0,0.5) 


0.069 


0.107 


0.063 


0.110 


1.101 


0.967 




(0.5,0.5) 


0.088 


0.144 


0.067 


0.133 


1.314 


1.087 


200 


(-0.5,0.5) 


0.048 


0.060 


0.036 


0.054 


1.360 


1.107 




(-0.5,0.0) 


0.041 


0.061 


0.031 


0.0543 


1.310 


1.119 




(0.0,0.5) 


0.030 


0.050 


0.030 


0.0516 


1.025 


0.974 




(0.5,0.5) 


0.035 


0.064 


0.030 


0.0598 


1.152 


1.068 
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Figure 1: Kaplan- Meier and model-fitted survival curves from the COGA study. 
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